The State Based Mixture of Experts HMM with Applications to the Recognition of Spontaneous Speech

نویسنده

Andreas Tuerk

چکیده

Although the performance of speech recognition systems has increased substantially over the last decades, there still remain a number of tasks which pose considerable problems for current state-of-the-art techniques. One of these tasks is the recognition of spontaneous speech which differs from read or planned speech in that its underlying dynamics change frequently over time. The negative effect of changes in acoustic background condition on recognition performance can also be observed in other situations as, for instance, in the case of speech that is corrupted by non-stationary noise. This thesis is concerned with the development of an acoustic model for speech recognition which automatically detects changes in the background condition of a signal and compensates for the model-data mismatch by combining the information of several expert models. These experts are specialised on the different acoustic conditions under consideration and their influence on the recognition process is determined by how well their associated condition matches the signal. This approach gives rise to the state based mixture of experts hidden Markov model (SBME-HMM) which is studied in this thesis both theoretically and in a number of recognition experiments. In principle, the SBME-HMM can be applied to distinguish implicitly between an arbitrary set of discrete acoustic conditions. Since, however, the main focus in this thesis is the application of the SBME-HMM to spontaneous speech the only conditions considered here will correspond to speech at different speaking rates. The SBME-HMM is an extension of the standard HMM which uses an additional hidden variable whose states are meant to correspond to the different acoustic conditions in the speech signal. The decision whether an acoustic condition is present is implemented in the SBME-HMM via a so-called indicating feature which is a continuous feature that contains information about the state of the hidden variable . This information is expressed in the SBME-HMM by a posterior probability distribution over the states of given the indicating feature. The theoretical development of the SBME-HMM in this thesis concerns both the estimation of the model parameters with the EM algorithm and its application in speech recognition. Special attention is given to the estimation of the posterior probability distributions over the states of the hidden variable which are modelled by softmax functions with polynomial exponents. It is shown that training these functions with the EM algorithm leads to an optimisation problem which can be linked to the cross-entropy error function. Although there are no closed form reestimation formulae for the softmax parameters they can be estimated robustly with an iterative scheme like the Newton-Raphson algorithm with line search and back-tracking. This is due to the convexity of the cross-entropy error surface which is asserted by proving the positive definiteness of the Hessian of the cross-entropy error with respect to the softmax parameters. In addition, this thesis addresses the problem of initialising the SBME-HMM and develops two different methods, namely the median split initialisation (MSI) and the relabelled training initialisation (RTI) which can initialise indicating features whose output probability density functions (pdf’s) are either Gamma densities or Gaussian mixtures. For recognition three different types of model topology are proposed that either use the state of the hidden variable explicitly in the recogni-

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM

Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...

متن کامل

Off-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model

In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...

متن کامل

A New Fast and Efficient HMM-Based Face Recognition System Using a 7-State HMM Along With SVD Coefficients

In this paper, a new Hidden Markov Model (HMM)-based face recognition system is proposed. As a novel point despite of five-state HMM used in pervious researches, we used 7-state HMM to cover more details. Indeed we add two new face regions, eyebrows and chin, to the model. As another novel point, we used a small number of quantized Singular Values Decomposition (SVD) coefficients as feature...

متن کامل

Recognizing the Emotional State Changes in Human Utterance by a Learning Statistical Method based on Gaussian Mixture Model

Speech is one of the most opulent and instant methods to express emotional characteristics of human beings, which conveys the cognitive and semantic concepts among humans. In this study, a statistical-based method for emotional recognition of speech signals is proposed, and a learning approach is introduced, which is based on the statistical model to classify internal feelings of the utterance....

متن کامل

Presentation of K Nearest Neighbor Gaussian Interpolation and comparing it with Fuzzy Interpolation in Speech Recognition

Hidden Markov Model is a popular statisical method that is used in continious and discrete speech recognition. The probability density function of observation vectors in each state is estimated with discrete density or continious density modeling. The performance (in correct word recognition rate) of continious density is higher than discrete density HMM, but its computation complexity is very ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2001

The State Based Mixture of Experts HMM with Applications to the Recognition of Spontaneous Speech

نویسنده

چکیده

منابع مشابه

Improving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM

Off-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model

A New Fast and Efficient HMM-Based Face Recognition System Using a 7-State HMM Along With SVD Coefficients

Recognizing the Emotional State Changes in Human Utterance by a Learning Statistical Method based on Gaussian Mixture Model

Presentation of K Nearest Neighbor Gaussian Interpolation and comparing it with Fuzzy Interpolation in Speech Recognition

عنوان ژورنال:

اشتراک گذاری